Workflow-based Services: Infrastructure for Scientific Applications
نویسنده
چکیده
The way scientists work in traditional sciences has changed drastically in recent years. Computer science is increasingly supporting them in performing and analyzing their experiments. Today, obtaining the raw data from instruments is merely the first step. No longer is the data analyzed on paper or with simple computational tools. Instead, massive amounts of data obtained raw from instruments are processed in complex and long-running computational pipelines. This trend of supporting traditional sciences with computational tools has lead to a significant speedup in executing experiments and has also enabled experiments which would not have been possible before. Scientists increasingly depend on adequate infrastructure to process experiment data using complex computational pipelines and to manage the plethora of data used and produced by them. Such computational pipelines are typically modeled as workflows and so this trend consequently challenges the current infrastructure for executing workflows as well as the infrastructure to manage the resulting data deluge. The work presented in this book addresses some of the challenges arising from this trend. A first challenge this book addresses is the sharing of scientific computations. Publishing a methodology for others to use is no longer possible with the scientific computations or workflows people develop today. Implementing such workflows and setting up the appropriate environment is far too difficult and consequently such computations or workflows must be shared. State-of-the-art approaches use proprietary interfaces to make such workflows available as services to be used by others. Proprietary interfaces, however, make it difficult to integrate such computations and we therefore use standardized interfaces. We describe in detail how we have mapped a workflow to such a standardized interface and report on an efficient implementation. The second challenge this book examines is how to provide adequate infrastructure for scientific workflows. One considerable advantage of scientific workflows is that they can be parametrized and one workflow can be run several thousand times for processing vast numbers of raw data sets obtained from instruments. Such large numbers of executions mandate the execution engine to be distributed in order to scale. However, configuring such a distributed engine is very difficult. Also, if it is configured optimally for just one type of workload, it may not be suitable for another. For this reason, we have added an autonomic controller to a distributed workflow execution engine. This controller monitors the behavior of the engine and adapts the configuration on the fly to serve the current workload best, adding self-configuration and self-optimization properties. The controller also heals the system in case of failures, thereby adding self-healing properties.
منابع مشابه
Architectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service
In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...
متن کاملArchitectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service
In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملDeploying Kepler Workflows as Services on a Cloud Infrastructure for Smart Manufacturing
21st Century Smart Manufacturing (SM) is manufacturing in which all information is available when it is needed, where it is needed, and in the form it is most useful [1,2] to drive optimal actions and responses. The 21st Century SM enterprise is data driven, knowledge enabled, and model rich with visibility across the enterprise (internal and external) such that all operating actions are determ...
متن کاملA SOA-Based Environment Supporting Collaborative Experiments in E-Science
Many sophisticated environments allow creating and managing of scientific workflows, whereas the workflow itself is provided as a service. Scientific Grids handle large amounts of data and share resources, but the implementation of service-based applications that use scientific infrastructures remains a challenging task, due to the heterogeneity of Grid middleware and different programming mode...
متن کاملWorkflow Engine for Clouds
A workflow models a process as consisting of a series of steps that simplifies the complexity of execution and management of applications. Scientific workflows in domains such as high-energy physics and life sciences utilize distributed resources in order to access, manage, and process a large amount of data from a higher level. Processing and managing such large amounts of data require the use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009